KeratoEL: Detection of keratoconus using corneal parameters with ensemble learning

Abstract Background and Aims Keratoconus is a progressive eye condition in which the normally round cornea thins and bulges outwards into a cone shape. This irregular shape causes light to scatter in multiple directions as it enters the eye, leading to distorted vision, increased sensitivity to light and frequent changes in the prescription of glasses or contact lenses. Detecting keratoconus at an early stage is not only difficult but also challenging. Methods The study has proposed an ensemble‐based machine learning (ML) technique named KeratoEL to detect keratoconus at an early stage. The proposed KeratoEL model combines the basic machine learning algorithms, namely support vector machine (SVM), decision tree (DT), random forest (RF) and artificial neural network (ANN). Before employing the ML model for keratoconus detection, the data set is first preprocessed manually by eliminating some features that don't contribute any significant value to predict the exact class. Moreover, the output features are labelled into three different classes and Extra Trees Classifier is used to find out the important features. Then, the features are sorted in descending order and top 45, 30, and 15 features are taken as input datasets against the output. Finally, different machine learning models are tested using the input datasets and performance metrics are measured Results The proposed model obtains 98.0%, 98.9% and 99.8% accuracy for top 45, 30, and 15 number of features respectively. Overall experimental results show that the proposed ensemble model outperforms the existing machine learning models. Conclusion The proposed KeratoEL model effectively detects keratoconus at an early stage by combining SVM, DT, RF, and ANN algorithms, demonstrating superior performance over existing models. These results underscore the potential of the KeratoEL ensemble approach in enhancing early detection and treatment of keratoconus.


| INTRODUCTION
Keratoconus is a corneal disorder marked by gradual thinning, leading to the protrusion of the cornea and a decline in visual acuity, without an inflammatory component. 1Typically, keratoconus manifests through progressively worsening myopia and astigmatism, resulting in suboptimal vision correction with glasses, challenges in adapting to contact lenses, visual exhaustion, and frequent headaches.The occurrence of keratoconus can vary, with a prevalence ranging from 0.2 to 4.790 cases per 100,000 individuals. 2Keratoconus may impact a single eye or both eyes simultaneously, exhibiting varying degrees of progression.Repetitive eye rubbing is widely recognized as the primary contributing factor to the progress of this disease. 3ratoconus is typically diagnosed through the evaluation of corneal topography and the analysis of specific biomechanical properties of the cornea. 4e symptoms of keratoconus may vary depending on the stage of the disease.Considering that keratoconus typically manifests during puberty, primarily affecting children, the development and design of novel diagnostic tools for early disease detection could play a pivotal role in preventing or slowing down its progression. 5This, in turn, could safeguard the vision of young individuals.When the symptoms become obvious, an ophthalmologist can easily diagnose the keratoconus.Nevertheless, diagnosing suspect cases or those in the initial stages of the disease can be challenging, as symptoms may not be obvious and necessitate a more thorough examination of corneal features such as topography, elevations, thickness, and biomechanical qualities etc.There are multiple techniques that have been proposed to identify keratoconus eyes using corneal topography data.The majority of methods, however, depend on a subjective interpretation of topographical maps, which is prone to observer bias. 6pically, keratoconus diagnosis involves a manual assessment conducted by specialists.These experts carefully evaluate various corneal attributes to gather the necessary data for confirming the presence of keratoconus.Nevertheless, to enhance specialist support in the diagnosis of keratoconus, numerous researchers have embraced machine learning (ML) algorithms to reinforce ophthalmologists' assessments concerning the presence of keratoconus in patients. 7The coordination of the experienced specialists along with the capabilities of ML in processing diverse data types holds the promise of reliably and accurately detecting keratoconus, even in its subclinical stage. 7This advancement not only expands the treatment options available to patients but also reduces the necessity for surgical interventions.Despite numerous technological advancements, accurately detecting early-stage keratoconus remains a challenging endeavor.
ML algorithms have gained popularity over the years due to their wide-ranging advantages for keratoconus detection and classification.However, various ML algorithms come with their unique strengths and limitations.So, building a hybrid ML model by integrating several models and utilizing their advantages allows for a superior model to be built.Herein, the primary objective is to utilize the benefits while eliminating the constraints associated with any particular model.Therefore, this paper proposes a hybrid ML model, known as ensemble learning, for early-stage keratoconus detection and classification.The major contributions of this work are summarized as below i) This study is focused on creating an efficient hybrid machine learning model for the early detection and classification of keratoconus, a critical step in improving patient outcomes.The proposed model, KeratoEL, combines SVM, DT, RF and ANN in the averaging ensemble method.
ii) The high dimensional data has been reduced into the most optimal feature set by employing the Extra Trees Classifier.As a result, the operational model exhibits improved performance without added processing complexity.
iii) The optimal feature set is categorized into three distinct subsets, comprising the top 45, 30, and 15 features, respectively.A thorough analysis has been conducted to assess how the number of features influences the model's performance across a range of feature sets, thereby validating its overall impact.
iv) The model's performance is rigorously assessed by examining its effectiveness on both standardized and normal data using various feature sets to see the valuable insights into how the model makes its decisions.
v) The study evaluates the effectiveness of the proposed ensemble model and compares it with state-of-the-art models.The results demonstrate the model's ability to generalize and its accuracy in making predictions.
The remaining section of this paper is organized as follows.
Section 2 provides a comprehensive literature review of the work.Section 3 includes an in-depth overview of our proposed approach, including pre-processing unit and classification model.Sections 4 and 5 present a brief description of the basic machine learning model and the evaluation metrics employed for the analysis of our proposed model, respectively.Section 8 includes experimental setup and the overall performance of our proposed model has been narrated in Section 7. Finally, Section 8 draws the conclusions of our work.

| RELATED WORKS
The noninflammatory keratoconus frequently results in irreversible vision loss when it advances into the fourth decade of life.
Considering the importance of its early onset, there is a growing need to develop tools and diagnostic techniques to detect keratoconus early, especially in the younger population, to stop or slow its progression.ML has become more popular in keratoconus detection.ML aided keratoconus detection is reliable and unbiased, which is more important in cases of early detection. 8ur sets of classifiers, including the multilayer perceptron (MLP), 9 radial basis function network (RBFN), 10 neural network (NN), and support vector machine (SVM) 11 have been tested in keratoconus diagnosis.Among the four classifiers, the MLP achieves the highest accuracy of 92.2%, while the SVM has the lowest accuracy, which is 84.42%.A learning system based on a neural network that enables the diagnosis of keratoconus is presented in. 12The obtained results show that the test data set has the best accuracy, achieving 97.33%.However, the algorithm uses an excessive number of parameters, which makes it challenging to train and test.In, 13 binary decision tree (DT) is applied for keratoconus diagnosis from corneal topography data and obtain an accuracy of 95%.The drawback of the proposed scheme is the small data set.In, 14 the authors suggested a categorization method employing corneal shape data from optical coherence tomography (OCT) based devices and achieved 92% accuracy with 244 eyes.Nonetheless, the study lacks details regarding the severity level of keratoconus in the eyes and whether it included cases at the early stages of the condition.
An automated keratoconus diagnosis technique utilizing artificial intelligence (AI) is presented in 15 with 91% accuracy.The algorithm utilizes a collection of topography images collected using a Pentacam 16 that have been classified into two categories by experts (keratoconus eyes and non-keratoconus eyes).The drawback of the proposed method is that the training data set comprises only 82 images.The authors in 17 used the SVM method to distinguish between people with keratoconus and others who are healthy.This study utilized 860 eyes with pentacam data, which were divided into five distinct groups: 454 eyes with keratoconus, 67 eyes with forme fruste, 28 eyes with astigmatism, 117 eyes following refractive surgery, and 194 normal eyes.This method analyzes 22 factors with an accuracy of 98.9%, 93.1% and 88.8% for three different classification tasks (keratoconus vs. normal eyes, forme fruste vs. normal eyes and all 5 groups).For identifying patients with early-stage keratoconus, a logistic regression statistical model was applied in. 18The sensitivity and specificity values in the application group were 85.0% and 86.7%, respectively.However, the model is applied only to young patients, overlooking its diagnostic applicability in older patients.
In, 14 SVM classifier based model is developed for disease detection using corneal measurements through the integration of a Scheimpflug camera and Placido corneal topography.The classifier exhibited outstanding accuracy, whether or not it incorporated data generated from the posterior corneal surface and corneal thickness.
In both scenarios, the accuracy exceeded 95%.Furthermore, when incorporating both anterior and posterior corneal data, SVM achieved an improved accuracy of over 97% for normal eyes.The keratoconus diagnosis is aided by the use of a convolutional neural network (CNN) in the KeratoDetect algorithm. 19The algorithm has been reported to show 99.33% accuracy, establishing it as a reliable screening tool for ophthalmologists.Since collecting real eye topographies is difficult, the data set is generated by the SyntEye KTC model.However, the algorithm's assessment does not consider this to be a bias.The authors in 5 developed a ML algorithm for keratoconus diagnosis based on corneal imaging data.A number of ML algorithms were applied and tested against real medical data, specifically elevation, corneal topography and corneal pachymetry etc. obtained from OCT based corneal topography.The accuracy of 25 different applied models varies from 62% to 94% and cubic SVM achieved the highest accuracy of 94%.In, 20 Random Forest (RF) has been utilized for classification among normal, suspect irregular and keratoconic corneas.The highest accuracy achieved in this study is 91.5%.
However, the model relies on the classification of images into their respective groups.In, 21 CNN is applied for normal and keratoconic eye detection based on the axial map of the anterior corneal surface.
The model utilizes the images of 3000 normal eyes and 3000 keratoconic eyes collected by a pentacam.The model is able to achieve a maximum accuracy of 99.5%.However, this study didn't consider the Suspect Keratoconus case during the evaluation.
From the literature review discussed so far, it is clear that there is significant interest and ongoing research in the area of keratoconus diagnosis using a variety of methods, including statistical models, ML, and AI.Although several studies have shown promising results, there is still a need to improve the accuracy and precision of these techniques and to address the issue of false positive detection.The potential of developing a hybrid model that combines discrete learning models and utilizes the benefits of each model remains underdeveloped in this domain.
Moreover, to utilize the advantages and reduce the limitations of discrete models, the decision has been made to develop an efficient hybrid model by combining multiple models.This hybrid model performs better in early disease detection and intervention than discrete models.

| Data description
Using the SS-1000 CASIA OCT Imaging Methods (Tomey, Japan) and additional parameters from the electronic health record (EHR) system, we obtained corneal optical coherence tomography (OCT) pictures from 12,242 eyes of 3162 individuals.Without any prerequisites, every piece of data accessible at each apparatus was collected.Then, we chose a single visit from each eye while excluding any eyes that lacked an Ectasia Status Index (ESI).3156 eyeballs in total satisfied the requirement.The average age of the participants was 69.7 (SD = 16.2) years, with almost 57% of the individuals being female. 22The ESI index of CASIA was used to create three screening labels: normal for values ranging between 0 and 4, forme fruste keratoconus (or keratoconussuspect) for values between 5 and 29, and keratoconus for values of 30 or higher.This data set, which used CASIA labels, includes 390 keratoconus cases, 796 forme-fruste keratoconus cases, and 1970 healthy eyes.The Belin-Ambrósio (BA) index and the instrument-guided screening index (ESI) have been found to have good agreement in the diagnosis of keratoconus. 22So, the data set, which we collected, has a size of 3162 records with 448 features.Among them, one last column is the target column to which our trained model will be mapped.

| Data cleaning
We had to drop several columns before training models.Because some of them include a constant number.Some of them include null values.Some of them include the same string.But all of them are actually not contributing to training our machine learning models.So, in total, we dropped 21 parameters from the data set for cleaning purposes.In total, 21 parameters were dropped from the original data set and they are presented in Table 1.

| Target label encoding
Since this is a classification problem, there is a target column which is in fact a range of values, called the ESI value, between 0 and 100 generated by the SS-1000 CASIA OCT Imaging Methods.We have changed the target parameter into three integers, 0, 1, 2. Where 0 means normal eye, 1 means suspect keratoconus eye and 2 means keratoconus eye.Target label encoding is shown in Table 2.After encoding the target label, we got 10.3% keratoconus eye, 24.9% suspect keratoconus eye and 64.8% normal eye which is shown in Figure 2.  F I G U R E 2 Existence of 3 different classes in the data set.

| Data standardization
Data standardization is the process of transforming data into a common format or scale to make it consistent and comparable.It involves techniques like mean-centering, variance scaling, and categorical data encoding.Here, standard scaler is used, which normalizes features by subtracting the average and adjusting to have a standard deviation of one.The standard score of a sample x is calculated as: Where, u is the mean of the training samples, and s is the standard deviation of the training samples.

| Ensemble techniques
The method we have proposed is an ensemble learning method that employs four different classifiers: SVM, DT, RF, and ANN.Herein, SVM is well-suited for high-dimensional data like the data set we used, known for good performance in classification tasks, and effective in handling small datasets and DT offers interpretability, allowing us to understand the decision-making process, and can handle mixed data types (categorical and numerical).In addition, RF combines multiple decision trees, reducing overfitting and improving generalization, robust to outliers and noise.Moreover, ANN is powerful for complex, nonlinear relationships, potentially capturing underlying patterns in the data that other algorithms might miss.The goal is to improve prediction accuracy by combining the predictions of these individual classifiers.The method begins by importing the required libraries, which are pandas, numpy, and scikit-learn (sklearn).
The pandas library is used to manipulate and analyze data, while numpy is used for scientific computing.Machine learning algorithms for classification, regression, and clustering are available in the scikitlearn library.

| EVALUATION METRICS
In this research, we have a two-fold objective: first, to classify Keratoconus, and second, to evaluate and measure the effectiveness and performance of the proposed approach by comparing it with existing methods.Table 3 displays the potential outcomes of a confusion matrix in 3-class classification.True Positive (TP), False Positive (FP), and False Negative (FN) are the three possible outcomes.

| Accuracy
In a three-class classification, accuracy is a measure of how many instances are classified correctly out of the total instances.The accuracy can be calculated using the following equation: F I G U R E 3 Important features with their corresponding scores prediction.
Where, TP1: The number of instances classified correctly as belonging to Class 1.
TP2: The number of instances classified correctly as belonging to Class 2.
TP3: The number of instances classified correctly as belonging to Class 3.
Total Instances: The total number of instances in data set.

| Precision
Precision is a performance metric that evaluates how well a model predicts the positive outcomes of a classification task, like threeclass classification.In the context of a three-class classification problem, precision can be defined independently for every class.
The formula for precision for a specific class (e.g., Class 1) is as follows:

Precision Class TTP True Positives for Class TP True Positives for Class
FP False Positives for Class Where, True Positives (TP) for Class 1: The number of instances classified correctly as belonging to Class 1.
False Positives (FP) for Class 1: The quantity of cases that are incorrectly classified as Class 1 even though they are not Class 1.

| Recall
Where, True Positives (TP) for Class 1: The number of cases correctly classified as Class 1.
False Negatives (FN) for Class 1: The number of cases incorrectly classified as not belonging to Class 1 when they do.

| F1 score
The F1 score is a prominent performance metric in classification tasks such as three-class classification.It combines precision and recall into a single metric, providing a balanced assessment of a model's ability to make accurate positive predictions while avoiding missing positive instances.The formula for the F1 score for a specific class (e.g., Class 1) is as follows:

F Score Class Precision Class Recall Class Precision Class
Recall Class Where, Precision (Class 1): As explained earlier, this measures the accuracy of positive predictions for Class 1.
Recall (Class 1): As explained earlier, this measures the model's ability to correctly identify all positive instances for Class 1.In the field of machine learning, the primary objective of feature

| EXPERIMENTAL SETUP
In this paper, an ensemble-based novel machine learning model, KeratoEL, is proposed for the detection of keratoconus.Our proposed model primarily comprises two key components: a pre-processing unit and an ensemble model designed for the classification stage.Within the pre-processing unit, the initial step involves the transformation of features, where each feature is first encoded into a label format at the input layer.The second step of the system incorporates a dimensionality reduction component designed to mitigate the challenges posed by the curse of dimensionality.For this feature reduction task, the Extra Trees Classifier has been employed as the foundation.The Extra Trees Classifier transforms the features of the original data set into a specific number of principal components, and only a subset of these components is chosen for the detection model.This process is consistent for the training, validation and testing datasets.Subsequently, a data standardization process is executed.Following the completion of the pre-processing phase, the resulting reduced feature set is directed into the model.It is a heterogeneous ensemble technique that utilizes the advantages of four different classifiers, such as SVM, DT, RF and ANN.The operational flow diagram of the proposed model is shown in Figure 1.

F
I G U R E 1 Proposed methodology with the ensemble technique.T A B L E 1 Dropped parameters and reason for dropping.

Feature 4 . 1 |
Figure 3 with their corresponding scores for prediction.

4. 2 | 4 . 3 | 4 . 4 |
Decision tree (DT) In a multi-class classification problem, decision trees work by creating a tree-like structure where each internal node represents a decision based on a specific feature, and each leaf node represents a class label.The algorithm recursively splits the data into smaller subsets based on the selected features until the subsets are pure or can no longer be further split.The selected features are ranked based on their information gain, which measures how much a feature contributes to reducing the impurity in the data.Artificial neural network (ANN) Our ANN is a feed-forward neural network with three layers: an input layer, two hidden layers, and an output layer.The input layer has a number of nodes equal to the number of input features in the data set.The two hidden layers have 128 and 64 nodes, respectively, and use the rectified linear unit (ReLU) activation function.The output layer has three nodes, which correspond to the three possible classes in the classification problem, and uses the softmax activation function to predict class probabilities.Random forest (RF) We have a data set with three classes, and we've determined the top 15 features for predicting these classes.The RF technique would then construct numerous decision trees from a random selection of the characteristics and data.The method would feed each data point through PAUL ET AL. | 5 of 15 each decision tree, and each tree would predict the class of the data point depending on the features it employed.The mode of the classes predicted by all decision trees would be the final forecast for that data point.

Recall, also known
as Sensitivity or True Positive Rate, is another significant performance metric used in classification tasks such as three-class classification.The ability of a model to correctly identify all positive instances within a given class is measured by recall.In the context of a three-class classification problem, recall can be defined independently for each class.The formula for recall for a specific class (e.g., Class 1) is as follows:

7 |
RESULT ANALYSIS AND DISCUSSIONThe evaluation of the keratoconus detection model's effectiveness relies on its metric scores.To enhance clarity, this study's overall results are divided into two main phases.Phase 1 concentrates on the performance attributes of the proposed keratoconus detection scheme, examining various feature sets both with and without data standardization.In Phase 2, a comparative analysis is conducted, evaluating the overall results across multiple individual machine learning classifiers, and contrasting them with state-of-the-art models in the field of keratoconus detection.T A B L E 3 Confusion matrix for three-class classification.FN: The number of instances classified incorrectly as not belonging to each class.FP: The number of instances classified incorrectly as belonging to each class when they do not.TP: The number of instances correctly classified as belonging to each of the three classes (Class 0, Class 1, and Class 2).PAUL ET AL. | 7 of 15

F I G U R E 4 | 9 of 15 7. 1 |
Confusion matrix for KeratoEL with data standardization for top (A) 45 features, (B) 30 features, & (C) 15 features.F I G U R E 5 Confusion matrix for KeratoEL without data standardization for top (A) 45 features, (B) 30 features, & (C) 15 features.F I G U R E 6 ROC curves for KeratoEL with data standardization for top (A) 45 features, (B) 30 features, & (C) 15 features.F I G U R E 7 ROC curves for KeratoEL without data standardization for top (A) 45 features, (B) 30 features, & (C) 15 features.PAUL ET AL.Phase 1: Experimental results for different feature set with and without data standardization The effectiveness of the Keratoconus Detection System (KDS) model is gauged by assessing its performance across a variety of metrics.A higher performance level is indicated by elevated values of metrics such as accuracy, precision, recall, F1-Score as shown in Equations (2-5) to demonstrate better detection capabilities.Table 5 in the study presents the experimental results of keratoconus detection.To facilitate meaningful comparative analysis, the results of individual metrics are compared for different feature sets with and without data standardization.This comparison allows for an assessment of how data set balancing impacts the model's performance across various evaluation metrics.
Different parameters of different models with their values to train them.Experimental results for different feature sets with and without data standardization.
T A B L E 4Abbreviations: ANN, artificial neural network; DT, decision tree, RF, random forest; SVM, support vector machine.T A B L E 5Note: Bold values indicate the best performances in terms of different performance metrics.
6,20ction is to choose the most pertinent features to enhance model performance, mitigate overfitting, expedite computation, and improve interpretability.In our study, Feature Set 1 comprises top 45 features, while Feature Set 2 and Feature Set 3 consist of top 30 and 15 features, respectively, out of 448 features.From Table5, it can be seen that for data standardization and without data Evaluation metrics analysis of different classifiers for the top 45 features without and with data standardization.Abbreviations: ANN, artificial neural network; DT, decision tree, RF, random forest; SVM, support vector machine ROC curve analysis is a graphical method used to assess the performance of classification models.It plots the trade-off between true positive rate (sensitivity) and false positive rate (1specificity) at different decision thresholds.A steeper ROC curve that's closer to the top-left corner indicates a better model.Evaluation metrics analysis of different classifiers for the top 30 features without and with data standardization.Evaluation metrics analysis of different classifiers for the top 15 features without and with data standardization.theKeratoELmodelvariesbetween0.98 and 1 for different features, whereas the lowest precision of 0.79 is achieved for the SVM algorithm without data standardization.Moreover, in the KeratoEL model, the recall metric ranges from 0.97 to 1 across various features, whereas AdaBoost attains its lowest recall score of 0.70 when applied to standardized data.In addition, the F1-score ranges from 0.97 to 1 in the KeratoEL model, while AdaBoost achieves its lowest F1-score of 0.79, whether the data is standardized or not.Figures8,9depict the accuracy of different classifiers for different features in the data set with and without data standardization.Figure8shows that the proposed KeratoEL model is able to achieve the highest accuracy of 99% with 15 features, whereas the AdaBoost algorithm, utilizing the top 45 features, attains the lowest accuracy of 87% without data standardization.Moreover, the KeratoEL model with the top 15 features has the highest accuracy of 99.8%, while the AdaBoost algorithm with the top 45 features has the lowest accuracy of 89.6% for standardized data, as shown in Figure9.We can a get little bit of improved performance if we standardize the data.Furthermore, reducing the number of features can improve the performance of any model.From the above analysis, it is clear that the proposed KeratoEL model is more effective and efficient than a single ML model.Table9shows the comparison of the proposed Ensemble model with state-of-the-art models in terms of accuracy and AUC.The performance of the proposed keratoEL for three class problems (e.g., Normal Eye, Suspect Keratoconus, and Keratoconus) is compared with previous studies.The authors of6,20have used a similar data set for this work, where they have utilized t-SNE and RF methods and obtained accuracy of 94.10% and 94% respectively.The highest accuracy and AUC of KeartoEL are 99.8% and 0.99 respectively, which are better than previous works, whether the data set is similar or different.This comparison shows the superiority of the proposed KeratoEL over previous studies.F G U R E 9 Comparison graph of different classifiers for 3 different feature groups (45, 30, and 15) with their corresponding accuracy score (with standardization).ANN, artificial neural network; DT, decision tree, RF, random forest; SVM, support vector machine T A B L E 9 Comparison with state-of-the-art models.
Note: Bold values indicate best and worst cases.Abbreviations: ANN, artificial neural network; DT, decision tree, RF, random forest; SVM, support vector machine Abbreviations: ANN, artificial neural network; DT, decision tree, RF, random forest; SVM, support vector machine F I G U R 8 Comparison graph of different classifiers for three different feature groups (45, 30, and 15) with their corresponding accuracy score (without standardization).ANN, artificial neural network; DT, decision tree, RF, random forest; SVM, support vector machine precision of Abbreviations: CNN, convolutional neural network; DT, decision tree, RF, random forest; SVM, support vector machine PAUL ET AL. | of 15